Engineering posts about Large Language Models
Curated summaries and key learnings for engineers working with Large Language Models.
Accelerating LLM Inference with Prompt Caching for Open‑Source Models on Databricks
The article outlines the significance of prompt caching in accelerating inference for large language models (LLMs) on Databricks. It explains how repeated prompts can lead to inefficiencies in...
Databricks for Good and Virtue Foundation: Partnering to Connect Medical Volunteers to Critical Health Services in 72 Countries
The article outlines the collaboration between Databricks for Good and the Virtue Foundation to enhance global health delivery through an AI-enabled platform. The initiative focuses on aggregating...
How to safeguard AI workloads with Unity AI Gateway Guardrails
The article outlines the importance of implementing guardrails in AI applications to protect sensitive information and ensure compliance with security standards. It details how Unity AI Gateway...
Databricks context engineer associate: the industry’s first certification for reliable AI agent systems
The article introduces the Databricks Certified Context Engineer Associate certification, the first of its kind aimed at enhancing the reliability of AI agent systems through effective context...
Creating a Multi-Tenant AI Agent Platform Handling 7K+ Sessions Without Cross-Team Interference
The article outlines the development of the Bring Your Own Planner (BYOP), a multi-tenant AI agent platform designed to enhance team autonomy and scalability within Salesforce. It addresses the...
Amazon Bedrock introduces new advanced prompt optimization and migration tool
Amazon Bedrock has introduced an advanced prompt optimization tool that allows users to enhance their prompts for various models simultaneously. This tool facilitates migration to new models or...
Build Long-running AI agents that pause, resume, and never lose context with ADK
This article presents a comprehensive guide to building long-running AI agents that can pause, resume, and maintain context using the Agent Development Kit (ADK). It highlights the limitations of...
Pushing the Frontier for Data Agents with Genie
The article presents Genie, a sophisticated data agent developed by Databricks, designed to enhance the analysis of both structured and unstructured enterprise data. It highlights the challenges...
How Superhuman and Databricks built a 200K QPS inference platform together
The article describes the collaboration between Superhuman and Databricks in developing a high-performance inference platform capable of handling over 200,000 queries per second (QPS) with stringent...
From Where Things Are to What They’re For: Benchmarking Spatial–Functional Intelligence for Multimodal LLMs
The paper introduces the Spatial-Functional Intelligence Benchmark (SFI-Bench), aimed at evaluating the advanced reasoning capabilities of multimodal large language models (MLLMs). It highlights the...
Generative AI for Business: A Complete Strategy and Implementation Guide
The article discusses the transformative potential of generative AI in business, highlighting its ability to create significant economic value across various sectors. It emphasizes the importance of...
LLM Vs AI: A Practical Guide to Differences, Use Cases, and Tools
This article serves as a comprehensive guide to understanding the distinctions between large language models (LLMs) and the broader field of artificial intelligence (AI). It outlines the scope, core...
Supercharging LLM inference on Google TPUs: Achieving 3X speedups with diffusion-style speculative decoding
The article discusses advancements in Large Language Model (LLM) inference acceleration through the implementation of block diffusion speculative decoding, specifically the DFlash method, on Google...
Reinforced Agent: Inference-Time Feedback for Tool-Calling Agents
The article introduces the concept of a Reinforced Agent that enhances tool-calling agents by incorporating inference-time feedback. This approach aims to address the limitations of traditional...
LaDiR: Latent Diffusion Enhances LLMs for Text Reasoning
The paper introduces LaDiR (Latent Diffusion Reasoner), a novel framework that enhances the reasoning capabilities of large language models (LLMs) by integrating latent diffusion models. It addresses...
Adaptive Thinking: Large Language Models Know When to Think in Latent Space
The article presents research on adaptive thinking in large language models (LLMs), particularly focusing on how these models can optimize their reasoning processes during inference. It introduces...
DigitalOcean Dedicated Inference: A Technical Deep Dive
The article delves into DigitalOcean's Dedicated Inference service, designed to efficiently manage large language model (LLM) inference at scale. It highlights the challenges of handling high...
ParaRNN: Large-Scale Nonlinear RNNs, Trainable in Parallel
The article presents ParaRNN, a novel framework developed by Apple researchers that significantly enhances the training efficiency of Recurrent Neural Networks (RNNs) by enabling parallelization....
A Practical Guide to LLM Fine Tuning
This article serves as a practical guide for ML engineers and AI practitioners focused on fine-tuning large language models (LLMs) for specific tasks. It outlines the entire lifecycle of LLM...
Are LLM agents good at join order optimization?
This article explores the innovative application of large language models (LLMs) in improving join order optimization in SQL queries, a long-standing challenge in database management. Traditional...